Sorting and Permuting without Bank Conflicts on GPUs

نویسندگان

  • Peyman Afshani
  • Nodari Sitchinava
چکیده

In this paper, we look at the complexity of designing algorithms without any bank conflicts in the shared memory of Graphical Processing Units (GPUs). Given input of size n, w processors and w memory banks, we study three fundamental problems: sorting, permuting and w-way partitioning (defined as sorting an input containing exactly n/w copies of every integer in [w]). We solve sorting in optimal O( n w log n) time. When n ≥ w, we solve the partitioning problem optimally in O(n/w) time. We also present a general solution for the partitioning problem which takes O( n w log3n/w w) time. Finally, we solve the permutation problem using a randomized algorithm in O( n w log log logn/w n) time. Our results show evidence that when working with banked memory architectures, there is a separation between these problems and the permutation and partitioning problems are not as easy as simple parallel scanning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provably Efficient GPU Algorithms

In this paper we present an abstract model for algorithm design on GPUs by extending the parallel external memory (PEM) model with computations in internal memory (commonly known as shared memory in GPU literature) defined in the presence of memory banks and bank conflicts. We also present a framework for designing bank conflict free algorithms on GPUs. Using our framework we develop the first ...

متن کامل

An Efficient Multiway Mergesort for GPU Architectures

Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and memory hier...

متن کامل

A Novel Computational Model for GPUs with Applications to Efficient Algorithms

We propose a novel computational model for GPUs. Known parallel computational models such as the PRAM model are not appropriate for evaluating GPU-based algorithms. Our model, called AGPU, abstracts the essence of current GPU architectures such as global and shared memory, memory coalescing and bank conflicts. Using our model, we can evaluate asymptotic behavior of GPU algorithms more efficient...

متن کامل

Increasing Throughput Performance with Arbitrary Modulus Indexing

Throughput architectures such as GPUs require parallel accesses to memory throughout the memory system to feed the massive numbers of executing threads. Within a single streaming processor, the primary memory system of a modern GPU can supply a peak throughput of dozens of memory accesses per cycle. Simultaneous access leads to memory-level conflicts between different threads, inhibiting perfor...

متن کامل

Optimizing CUDA Shared Memory Usage

CUDA shared memory is fast, on-chip storage. However, the bank conflict issue could cause a performance bottleneck. Current NVIDIA Tesla GPUs support memory bank accesses with configurable bit-widths. While this feature provides an efficient bank mapping scheme for 32-bit and 64-bit data types, it becomes trickier to solve the bank conflict problem through manual code tuning. This paper present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015